Search Results: "Craig Small"

31 May 2017

Craig Small: The sudo tty bug and procps

There have been recent reports of a security bug in sudo (CVE-2017-1000367) where you can fool sudo into thinking what controlling terminal it is running on to bypass its security checks. One of the first things I thought of was, is procps vulnerable to the same bug? Sure, it wouldn t be a security bypass, but it would be a normal sort of bug. A lot of programs in procps have a concept of a controlling terminal, or the TTY field for either viewing or filtering, could they be fooled into thinking the process had a different controlling terminal? Was I going to be in the same pickle as the sudo maintainers? The meat between the stat parsing sandwich? Can I find any more puns related somehow to the XKCD comic? TLDR: No. How to find the tty Most ways of finding what the controlling terminal for a process is on are very similar. The file /proc//stat is a one-liner pseudo file that the kernel creates on access that has information about the particular process. A typical file would look like:
20209 (bash) S 14762 20209 20209 34822 20209 4194304 32181 4846307 625 1602 66 3
0 16265 4547 20 0 1 0 139245105 25202688 1349 18446744073709551615 4194304 52421
32 140737059557984 0 0 0 0 3670020 1266777851 1 0 0 17 1 0 0 280 0 0 7341384 738
8228 39092224 140737059564618 140737059564628 140737059564628 140737059569646 0
The first field is the PID, the second the process name (which may be different than the command line, but that s another story), then skip along to field #7 which in this case is 34822. Also notice the process name is in brackets; that is important. So 34822, how do we figure out what device this is? The number is the major and minor device numbers of the controlling terminal. 38422 in hex is 8806, the device has a major number of 88h or 136 and a minor number of 06. Most programs just scan the usual device directories until they find a match (which is basically how procps does it). Device 136,6 is /dev/pts/6
crw--w---- 1 user tty 136, 6 May 29 16:20 /dev/pts/6
$ ps -o tty,cmd 20209
TT CMD
pts/6 /bin/bash
The Bug The process of taking the raw stat file and having a bunch of useful fields is called parsing. The bug in sudo was due to how they parsed the file. The stat file is a space-delimited file. The program scanned the file, character by character, until it came across the 6th space. The problem is, you can put spaces in your command and fool sudo. Once you know that, you can make sudo think the program is running on any (or at least a different) controlling terminal. The bug reporters then used some clever symlinking and race techniques to then get root. What about procps? The parsing of the current (as of writing) procps on the stat file is found in proc/readproc.c within the function stat2proc(). However, it is not just a simple sscanf or something that runs along the line looking for spaces. To find the command, the program does the following:
 S = strchr(S, '(') + 1;
 tmp = strrchr(S, ')');
 num = tmp - S;
 if(unlikely(num >= sizeof P->cmd)) num = sizeof P->cmd - 1;
 memcpy(P->cmd, S, num);
 P->cmd[num] = '\0';
 S = tmp + 2; // skip ") "
The sscanf then comes after we have found the command, using the variable S, to fill in the other fields including the controlling terminal device numbers. procps library looks for the command within brackets. So if your program has spaces in it, it is still found. By using a strrchr (effectively, find the last) you cannot fool it with a bracket in the command either. So procps is not vulnerable to this sort of trickery. Incidently, the fix for the sudo bug now uses strrchr for a close bracket, so it solves the problem the same way. The check for the close bracket appeared in procps 3.1.4 back in 2002, though the stat2proc function was warning about odd named processes before then. As it says in the 2002 change:
Reads /proc/*/stat files, being careful not to trip over processes with names like :-) 1 2 3 4 5 6 .
That s something we can all agree on!

7 February 2017

Craig Small: WordPress 4.7.2

When WordPress originally announced their latest security update, there were three security fixes. While all security updates can be serious, they didn t seem too bad. Shortly after, they updated their announcement with a fourth and more serious security problem. I have looked after the Debian WordPress package for a while. This is the first time I have heard people actually having their sites hacked almost as soon as this vulnerability was announced. If you are running WordPress 4.7 or 4.7.1, your website is vulnerable and there are bots out there looking for it. You should immediately upgrade to 4.7.2 (or, if there is a later 4.7.x version to that). There is now updated Debian wordpress version 4.7.2 packages for unstable, testing and stable backports. For stable, you are on a patched version 4.1 which doesn t have this specific vulnerability (it was introduced in 4.7) but you should be using 4.1+dfsg-1+deb8u12 which has the fixes found in 4.7.1 ported back to 4.1 code.

12 October 2016

Craig Small: axdigi resurrected

Seems funny to talk about 20 year old code that was a stop-gap measure to provide a bridging function the kernel had not (as yet) got, but here it is, my old bridge code. When I first started getting involved in Free Software, I was also involved with hamradio. In 1994 I release my first Free Software, or Open Source program called axdigi. This program allowed you to digipeat . This was effectively source route bridging across hamradio packet networks. The code I used for this was originally network sniffer code to debug my PackeTwin kernel driver but got frustrated at there being no digipeating function within Linux, so I wrote axdigi which is about 200 lines. The funny thing is, back then I thought it would be a temporary solution until digipeating got put into the kernel, which it temporarily did then got removed. Recently some people asked me about axdigi and where there is an official place where the code lives. The answer is really the last axdigi was 0.02 written in July 1995. It seems strange to resurrect 20 year old code but it is still in use; though it does show its age. I ve done some quick work on getting rid of the compiler warnings but there is more to do. So now axdigi has a nice shiny new home on GitHub, at https://github.com/csmall/axdigi

11 October 2016

Craig Small: Changing Jabber IDs

I ve shuffled some domains around, using less of enc.com.au and more of my new domain dropbear.xyz The website should work with both, but the primary domain is dropbear.xyz Another change is my Jabber ID which used to be csmall at enc but now is same username at dropbear.xyz I think I have done all the required changes in prosody for it to work, even with a certbot certificate!

10 July 2016

Craig Small: procps 3.3.12

The procps developers are happy to announce that version 3.3.12 of procps was released today. This version has a mixture of bug fixes and enhancements. This unfortunately means another API bump but we are hoping this will be fixed with the new library API coming soon. procps is developed on gitlab and the new version of procps can be found at https://gitlab.com/procps-ng/procps/tree/newlib procps 3.3.12 can be found at https://gitlab.com/procps-ng/procps/tags/v3.3.12 From the NEWS file, procps 3.1.12 has the following: We are hoping this will be the last one to use the old API and the new format API ( imaginatively called newlib ) will be used in subsequent releases. Feedback for this and any other version of procps can be sent to either the issue tracker or the development email list.

7 May 2016

Craig Small: Displaying Linux Memory

Memory management is hard, but RAM management may be even harder. Most people know the vague overall concept of how memory usage is displayed within Linux. You have your total memory which is everything inside the box; then there is used and free which is what the system is or is not using respectively. Some people might know that not all used is used and some of it actually is free. It can be very confusing to understand, even for a someone who maintains procps (the package that contains top and free, two programs that display memory usage). So, how does the memory display work? What free shows The free program is part of the procps package. It s central goal is to give a quick overview of how much memory is used where. A typical output (e.g. what I saw when I typed free -h ) could look like this:
      total   used    free   shared  buff/cache  available
Mem:    15G   3.7G    641M     222M         11G        11G
Swap:   15G   194M     15G
I ve used the -h option for human-readable output here for the sake of brevity and because I hate typing long lists of long numbers. People who have good memories (or old computers) may notice there is a missing -/+ buffers/cache line. This was intentionally removed in mid-2014 because as the memory management of Linux got more and more complicated, these lines became less relevant. These used to help with the not used used memory problem mentioned in the introduction but progress caught up with it. To explain what free is showing, you need to understand some of the underlying statistics that it works with. This isn t a lesson on how Linux its memory (the honest short answer is, I don t fully know) but just enough hopefully to understand what free is doing. Let s start with the two simple columns first; total and free. Total Memory This is what memory you have available to Linux. It is almost, but not quite, the amount of memory you put into a physical host or the amount of memory you allocate for a virtual one. Some memory you just can t have; either due to early reservations or devices shadowing the memory area. Unless you start mucking around with those settings or the virtual host, this number stays the same. Free Memory Memory that nobody at all is using. They haven t reserved it, haven t stashed it away for future use or even just, you know, actually using it. People often obsess about this statistic but its probably the most useless one to use for anything directly. I have even considered removing this column, or replacing it with available (see later what that is) because of the confusion this statistic causes. The reason for its uselessness is that Linux has memory management where it allocates memory it doesn t use. This decrements the free counter but it is not truly used . If you application needs that memory, it can be given back. A very important statistic to know for running a system is how much memory have I got left before I either run out or I start to serious swap stuff to swap drives. Despite its name, this statistic will not tell you that and will probably mislead you. My advice is unless you really understand the Linux memory statistics, ignore this one. Who s Using What Now we come to the components that are using (if that is the right word) the memory within a system. Shared Memory Shared memory is often thought of only in the context of processes (and makes working out how much memory a process uses tricky but that s another story) but the kernel has this as well. The shared column lists this, which is a direct report from the Shmem field in the meminfo file. Slabs For things used a lot within the kernel, it is inefficient to keep going to get small bits of memory here and there all the time. The kernel has this concept of slabs where it creates small caches for objects or in-kernel data strucutures that slabinfo(5) states [such as] buffer heads, inodes and dentries . So basically kernel stuff for the kernel to do kernelly things with. Slab memory comes in two flavours. There is reclaimable and unreclaimable. This is important because unreclaimable cannot be handed back if your system starts to run out of memory. Funny enough, not all reclaimable is, well, reclaimable. A good estimate is you ll only get 50% back, top and free ignore this inconvenient truth and assume it can be 100%. All of the reclaimable slab memory is considered part of the Cached statistic. Unreclaimable is memory that is part of Used. Page Cache and Cached Page caches are used to read and write to storage, such as a disk drive. These are the things that get written out when you use sync and make the second read of the same file much faster. An interesting quirk is that tmpfs is part of the page cache. So the Cached column may increase if you have a few of these. The Cached column may seem like it should only have Page Cache, but the Reclaimable part of the Slab is added to this value. For some older versions of some programs, they will have no or all Slab counted in Cached. Both of these versions are incorrect. Cached makes up part of the buff/cache column with the standard options for free or has a column to itself for the wide option. Buffers The second component to the buff/cache column (or separate with the wide option) is kernel buffers. These are the low-level I/O buffers inside the kernel. Generally they are small compared to the other components and can basically ignored or just considered part of the Cached, which is the default for free. Used Unlike most of the previous statistics that are either directly pulled out of the meminfo file or have some simple addition, the Used column is calculated and completely dependent on the other values. As such it is not telling the whole story here but it is reasonably OK estimate of used memory. Used component is what you have left of your Total memory once you have removed: Notice that the unreclaimable part of slab is not in this calculation, which means it is part of the used memory. Also note this seems a bit of a hack because as the memory management gets more complicated, the estimates used become less and less real. Available In early 2014, the kernel developers took pity on us toolset developers and gave us a much cleaner, simpler way to work out some of these values (or at least I d like to think that s why they did it). The available statistic is the right way to work out how much memory you have left. The commit message explains the gory details about it, but the great thing is that if they change their mind or add some new memory feature the available value should be changed as well. We don t have to worry about should all of slab be in Cached and are they part of Used or not, we have just a number directly out of meminfo. What does this mean for free? Poor old free is now at least 24 years old and it is based upon BSD and SunOS predecessors that go back way before then. People expect that their system tools don t change by default and show the same thing over and over. On the other side, Linux memory management has changed dramatically over those years. Maybe we re all just sheep (see I had to mention sheep or RAMs somewhere in this) and like things to remain the same always. Probably if free was written now; it would only need the total, available and used columns with used merely being total minus available. Possibly with some other columns for the wide option. The code itself (found in libprocps) is not very hard to maintain so its not like this change will same some time but for me I m unsure if free is giving the right and useful result for people that use it.

28 January 2016

Craig Small: pidof lost a shell

pidof is a program that reports the PID of a process that has the given command line. It has an option x which means scripts too . The idea behind this is if you have a shell script it will find it. Recently there was an issue raised saying pidof was not finding a shell script. Trying it out, pidof indeed could not find the sample script but found other scripts, what was going on? What is a script? Seems pretty simple really, a shell script is a text file that is interpreted by a shell program. At the top of the file you have a hash bang line which starts with #! and then the name of shell that is going to interpret the text. When you use the x option, the pidof uses the following code:
          if (task.cmd &&
                    !strncmp(task.cmd, cmd_arg1base, strlen(task.cmd)) &&
                    (!strcmp(program, cmd_arg1base)  
                    !strcmp(program_base, cmd_arg1)  
                    !strcmp(program, cmd_arg1)))
What this means if match if the process comm (task.cmd) and the basename (strip the path) of argv[1] match and one of the following: The Hash Bang Line Most scripts I have come across start with a line like
#!/bin/sh
Which means use the normal shell (on my system dash) shell interpreter. What was different in the test script had a first line of
#!/usr/bin/env sh
Which means run the program sh in a new environment. Could this be the difference? The first type of script has the following procfs files:
$ cat -e /proc/30132/cmdline
/bin/sh^@/tmp/pidofx^@
$ cut -f 2 -d' ' /proc/30132/stat
(pidofx)
The first line picks up argv[1] /tmp/pidofx while the second finds comm pidofx . The primary matching is satisfied as well as the first dot-point because the basename of argv[1] is pidofx . What about the script that uses env?
$ cat -e /proc/30232/cmdline
bash^@/tmp/pidofx^@
$ cut -f 2 -d' ' /proc/30232/stat
(bash)
The comm bash does not match the basename of argv[1] so this process is not found. How many execve? So the proc filesystem is reporting the scripts differently depending on the first line, but why? The fields change depending on what process is running and that is dependent on the execve function calls. A typical script has a single execve call, the strace output shows:
29332 execve("./pidofx", ["./pidofx"], [/* 24 vars */]) = 0
While the env has a few more:
29477 execve("./pidofx", ["./pidofx"], [/* 24 vars */]) = 0
 29477 execve("/usr/local/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = -1 ENOENT (No such file or directory)
 29477 execve("/usr/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = -1 ENOENT (No such file or directory)
 29477 execve("/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = 0
The first execve is the same for both, but then env is called and it goes on its merry way to find sh. After trying /usr/local/bin, /usr/bin it finds sh in /bin and execs this program. Because of there are two successful execve calls, the procfs fields are different. What Next? So now the mystery of pidof missing scripts now has a reasonable reason. The problem is, how to fix pidof? There doesn t seem to be a fix that isn t a kludge. Hard-coding potential script names seems just evil but there doesn t seem to be a way to differentiate between a script using env and, say, vi ./pidofx . If you got some ideas, comment below or in the issue on gitlab.

30 December 2015

Craig Small: Forking processes and Gtk2

I made a change recently on the gjay program. gjay is a gtk program that basically analyzes your music and makes playlists. There is a gui frontend and a analyzer back-end and they communicate through a pipe. One really useful debugging option gtk has is to make warnings fatal, so when gtk finds one it crashes at that point and you can use gdb to trap it. The flag is g-fatal-warnings. I have been updating gjay and initially it didn t have this option, so I needed to add the gtk options, which is a simple one-liner. But then gjay gave some odd errors about XCB. Often, but not every time gjay started on stderr the following cryptic messages appear:
[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
forktest: ../../src/xcb_io.c:274: poll_for_event: Assertion  !xcb_xlib_threads_sequence_lost' failed.
OK, so part of my init sequence was unhappy. The odd thing was it only appeared when I added the gtk options. I narrowed it down with some test code which displayed the error but stripped all the other parts out.
  1. #include <gtk/gtk.h>
  2.  
  3. int main(int argc, char *argv[])
  4.  
  5.     GOptionContext *context;
  6.     GError *error;
  7.     pid_t pid;
  8.     GtkWidget *win;
  9.  
  10.     context = g_option_context_new("My test");
  11.     g_option_context_add_group (context, gtk_get_option_group (TRUE));
  12.     error = NULL;
  13.     if (!g_option_context_parse(context, &argc, &argv, &error))
  14.         return 1;
  15.     pid = fork();
  16.     if (pid < 0) 
  17.         return 1;
  18.     if (pid == 0)   //child
  19.         GMainLoop *loop;
  20.  
  21.         loop = g_main_new(FALSE);
  22.         g_main_run(loop);
  23.         return 0;
  24.       else   // parent
  25.         if (gtk_init_check(&argc, &argv) == FALSE)
  26.             return 1;
  27.         win = gtk_window_new(GTK_WINDOW_TOPLEVEL);
  28.         gtk_widget_show(win);
  29.         gtk_main();
  30.      
  31.     return 0;
  32.  
What we got going on here is a simple gtk program. Line 11 with the gtk_get_option_group() was the line I added. The problem is that the child process is not quite setup and when the parent goes into gtk_main then you get those xcb errors. The options need to be parsed before the fork, because one of the options is to not fork (it just runs the child code directly and uses stdout with no GUI). Gjay is obviously a lot more complex than this, but follows the same pattern. There is the front-end looping through gtk_main and the back-end looping on g_main_run. The backend needs to use glib not gtk as there is a non-gui option which can be used. The solution is actually (but not obviously to me) in the documentation for gtk_get_option_group. The parameter for that function whether to open the default display when parsing the commandline arguments . Changing the TRUE to FALSE on line 11 stops XCB from complaining! The screen still appears fine but not more strange XCB messages.

14 November 2015

Craig Small: Mixing pysnmp and stdin

Depending on the application, sometimes you want to have some socket operations going (such as loading a website) and have stdin being read. There are plenty of examples for this in python which usually boil down to making stdin behave like a socket and mixing it into the list of sockets select() cares about. A while ago I asked an email list could I have pysnmp use a different socket map so I could add my own sockets in (UDP, TCP and a zmq to name a few) and the Ilya the author of pysnmp explained how pysnmp can use a foreign socket map. This sample code below is merely an mixture of Ilya s example code and the way stdin gets mixed into the fold. I have also updated to the high-level pysnmp API which explains the slight differences in the calls.
  1. from time import time
  2. import sys
  3. import asyncore
  4. from pysnmp.hlapi import asyncore as snmpAC
  5. from pysnmp.carrier.asynsock.dispatch import AsynsockDispatcher
  6.  
  7.  
  8. class CmdlineClient(asyncore.file_dispatcher):
  9.     def handle_read(self):
  10.     buf = self.recv(1024)
  11.     print "you said  ".format(buf)
  12.  
  13.  
  14. def myCallback(snmpEngine, sendRequestHandle, errorIndication,
  15.                errorStatus, errorIndex, varBinds, cbCtx):
  16.     print "myCallback!!"
  17.     if errorIndication:
  18.         print(errorIndication)
  19.         return
  20.     if errorStatus:
  21.         print('%s at %s' % (errorStatus.prettyPrint(),
  22.               errorIndex and varBinds[int(errorIndex)-1] or '?')
  23.              )
  24.         return
  25.  
  26.     for oid, val in varBinds:
  27.     if val is None:
  28.         print(oid.prettyPrint())
  29.     else:
  30.         print('%s = %s' % (oid.prettyPrint(), val.prettyPrint()))
  31.  
  32. sharedSocketMap =   
  33. transportDispatcher = AsynsockDispatcher()
  34. transportDispatcher.setSocketMap(sharedSocketMap)
  35. snmpEngine = snmpAC.SnmpEngine()
  36. snmpEngine.registerTransportDispatcher(transportDispatcher)
  37. sharedSocketMap[sys.stdin] = CmdlineClient(sys.stdin)
  38.  
  39. snmpAC.getCmd(
  40.     snmpEngine,
  41.     snmpAC.CommunityData('public'),
  42.     snmpAC.UdpTransportTarget(('127.0.0.1', 161)),
  43.     snmpAC.ContextData(),
  44.     snmpAC.ObjectType(
  45.         snmpAC.ObjectIdentity('SNMPv2-MIB', 'sysDescr', 0)),
  46.     cbFun=myCallback)
  47.  
  48. while True:
  49.     asyncore.poll(timeout=0.5, map=sharedSocketMap)
  50.     if transportDispatcher.jobsArePending() or transportDispatcher.transportsAreWorking():
  51.         transportDispatcher.handleTimerTick(time())
Some interesting lines from the above code: With all this I can handle keyboard presses and network traffic, such as a simple SNMP poll.

26 October 2015

Craig Small: ps standards and locales

I looked at two interesting issues today around the ps program in the procps project. One had a solution and the other I m puzzled about. ps User-defined Format Issue #9 was quite the puzzle. The output of ps changed depending if a different option had a hyphen before it or not. First, the expected output
$ ps p $$ -o pid=pid,comm=comm
 pid comm
31612 bash
Next, the unusual output.
$ ps -p $$ -o pid=pid,comm=comm
pid,comm=comm
 31612
The difference being the second we have -p not p. The second unexpected thing about this is, it was designed that way. Unix98 standard apparently permits this sort of craziness. To me it is a useless feature that will more likely than not confuse people. Within ps, depending on what sort of flags you start with, you use a sysv parser or a bsd parser. One of them triggered the Unix98 compatibility option while the other did not, hence the change in behavior. The next version of procps will ship with a ps that has the user-defined output format of the first example. I had a look at the latest standard, IEEE 1003.1-2013, doesn t seem to have anything like that in it. Short Month Length This one has got me stuck. A user has reported in issue #5 that when they use their locale columns such as start time get mis-aligned because their short month is longer than 3 characters. They also mention some other columns for memory etc are not long enough but that s a generic problem that is impossible to fix sensibly. OK, for the month problem the fix would be to know what the month length is and set the column width for those columns to that plus a bit more for the other fixed components, simple really. Except; how do you know for a specific locale what their short month length is? I always assumed it was three! I haven t found anything that has this information. Note, I m not looking for strlen() but some function that has the maximum length for short month names (e.g. Jan, Feb, Mar etc). This also got me thinking how safe some of those date to string functions are if you have a static buffer as the destination. It s not safe to assume they will be DD MMM YYYY because there might be more Ms. So if you know how to work out the short month name length, let me know!

8 September 2015

Craig Small: WordPress 4.3 getting slow you might need an update

Wordpress under water I recently received Debian bug report #798350 where the user had a problem with wordpress. After upgrading to version 4.3, the webservers performance degrades over time. The problem is also reported at the wordpress site with bug WordPress ticket 33423 including the fix. I have backported the relevant changeset and uploaded Debian wordpress package 4.3+dfsg-2 which only contains this changeset. For a lot of people, including myself, you probably won t hit this bug but if it impacts you, then try this update.

4 September 2015

Guido G nther: Debian work in August 2015

Debian LTS August was the fourth month I contributed to Debian LTS under the Freexian umbrella. In total I spent four hours working on: Besides that I did CVE triaging of 9 CVEs to check if and how they affect oldoldstable security as part of my LTS front desk work. Debconf 15 was a great opportunity to meet some of the other LTS contributors in person and to work on some of my packages: Git-buildpackage git-buildpackage gained buildpackage-rpm based on the work by Markus Lehtonnen and merging of mock support is hopefully around the corner. Debconf had two gbp skill shares hosted by dkg and a BoF by myself. A summary is here. Integration with dgit as (discussed with Ian) looks doable and I have parts of that on my todo list as well. Among other things gbp import-orig gained a --merge-mode option so you can replace the upstream branches verbatim on your packaging branch but keep the contents of the debian/ directory. Libvirt I prepared an update for libvirt in Jessie fixing a crasher bug, QEMU error reporting. apparmor support now works out of the box in Jessie (thanks to intrigeri and Felix Geyer for that). Speaking of apparmor I learned enough at Debconf to use this now by default so we hopefully see less breackage in this area when new libvirt versions hit the archive. The bug count wen't down quiet a bit and we have a new version of virt-manager in unstable now as well. As usual I prepared the RC candidates of libvirt 1.2.19 in experimental and 1.2.19 final is now in unstable.

9 August 2015

Craig Small: procps 3.3.11

I have updated NEWS, bumped the API and tagged in git; procps version 3.3.11 is now released! This release we have fixed many bugs and made procps more robust for those odd corner cases. See the NEWS file for details. The most significant new feature in this release is the support for LXC containers in both ps and top. The source files can be found at both sourceforge and gitlab at: My thanks to the procps co-maintainers, bug reporters and merge/patch authors. What s Next? There has been a large amount of work on the library API. This is not visible to this release as it is on a different git branch called newlib. The library is getting a complete overhaul and will look completely different to the old libproc/libprocps set. A decision hasn t been made when newlib branch will merge into master, but we will do it once we re happy the library and its API have settled. This change will be the largest change to procps library in its 20-odd year history but will mean the library will use common modern practices for libraries.

8 August 2015

Craig Small: Be careful with errno

I m getting close to releasing version 3.3.11 of procps. When it gets near that time, I generally browse again the Debian Bug Tracker for procps bugs. Bug number #733758 caught my eye. With the free command if you used the s option before the c option, the s option failed, seconds argument N failed where N was the number you typed in. The error should be for you trying to type letters for number of seconds. Seemed reasonably simple to test and simple to fix. Take me to the code The relevant code looks like this:
   case 's':
            flags  = FREE_REPEAT;
            args.repeat_interval = (1000000 * strtof(optarg, &amp;endptr));
            if (errno   optarg == endptr   (endptr &amp;&amp; *endptr))
                xerrx(EXIT_FAILURE, _("seconds argument  %s' failed"), optarg);
Seems pretty stock-standard sort of function. Use strtof() to convert the string into the float. You need to check both errno AND optarg == endptr because: At first I thought the logic was wrong, but tracing through it was fine. I then compiled free using the upstream git source, the program worked fine with s flag with no c flag. Doing a diff between the upstream HEAD and Debian s 3.3.10 source showed nothing obvious. I then shifted the upstream git to 3.3.10 too and re-compiled. The Debian source failed, the upstream parsed the s flag fine. I ran diff, no change. I ran md5sum, the hashes matched; what is going on here? I ll set when I want The man page says in the case of under/overflow ERANGE is stored in errno . What this means is if there isn t and under/overflow then errno is NOT set to 0, but its just not set at all. This is quite useful when you have a chain of functions and you just want to know something failed, but don t care what. Most of the time, you generally would have a Have I failed? test and then check errno for why. A typical example is socket calls where anything less than 0 means failure. You check the return value first and then errno. strtof() is one of those funny ones where most people check errno directly; its simpler than checking for +/- HUGE_VAL. You can see though that there are traps. What s the difference? OK, so a simple errno=0 above the call fixes it, but why would the Debian source tree have this failure and the upstream not? Even with the same code? The difference is how they are compiled. The upstream compiles free like this:
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -include ./config.h -I./include -DLOCALEDIR=\"/usr/local/share/locale\" -Iproc -g -O2 -MT free.o -MD -MP -MF .deps/free.Tpo -c -o free.o free.c
mv -f .deps/free.Tpo .deps/free.Po
/bin/bash ./libtool --tag=CC --mode=link gcc -std=gnu99 -Iproc -g -O2 ./proc/libprocps.la -o free free.o strutils.o fileutils.o -ldl
libtool: link: gcc -std=gnu99 -Iproc -g -O2 -o .libs/free free.o strutils.o fileutils.o ./proc/.libs/libprocps.so -ldl
While Debian has some hardening flags:
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -include ./config.h -I./include -DLOCALEDIR=\"/usr/share/locale\" -D_FORTIFY_SOURCE=2 -Iproc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -MT free.o -MD -MP -MF .deps/free.Tpo -c -o free.o free.c
mv -f .deps/free.Tpo .deps/free.Po
/bin/bash ./libtool --tag=CC --mode=link gcc -std=gnu99 -Iproc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security ./proc/libprocps.la -Wl,-z,relro -o free free.o strutils.o fileutils.o -ldl
libtool: link: gcc -std=gnu99 -Iproc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z -Wl,relro -o .libs/free free.o strutils.o fileutils.o ./proc/.libs/libprocps.so -ldl
It s not the compiling of free itself that is doing it, but the library. Most likely something that is called before the strtof() is setting errno which this code then falls into. In fact if you run the upstream free linked to the Debian procps library it fails. Moral of the story is to set errno before the function is called if you are going to depend on it for checking if the function succeeded.

13 June 2015

Craig Small: Linux 4.0 ate my docker images

I have previously written about the gitlab CI runners that use docker. Yesterday I made some changes to procps and pushed them to gitlab which would then start the CI. This morning I checked and it said build failed ok, so that s not terribly unusual. The output from the runner was:
gitlab-ci-multi-runner 0.3.3 (dbaf96f)
Using Docker executor with image csmall/testdebian ...
Pulling docker image csmall/testdebian ...
Build failed with Error: image csmall/testdebian: not found
Hmm, I know I have that image, it just must be the runner so, let s see what images I have:
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
Now, I know I have images, I had about 10 or so of them, where did they go? I even looked in the /var/lib/docker directories and can see the json configs, what have you done with my images docker? Storage Drivers The first hint I got from stackexchange where someone lost their AUFS images and needed to load the aufs kernel module. Now I know there are two places or methods where docker stores its images. They are called aufs and devicemapper. There is some debate around which one is better and to be honest with what I do I don t much care, I just want it to work. The version of kernel is significant. It seems the default storage container was AUFS and this requires the aufs.ko kernel module. Linux 4.0 (the version shipped with Debian) does NOT have that module, or at least I couldn t find it. For new images, this isn t a problem. Docker will just create the new images using devicemapper and everyone is happy. The problem is where you have old aufs images, like me. I want those images. Rescue the Images I m not sure if this is the best or most correct way of getting your images, but for me it worked. I got the idea basically from someone who wanted to switch from aufs to devicemapper images for other reasons. You first need to reboot and select at the grub prompt a 3.x kernel that has aufs support. Then when the system comes up, you should see all your images, like this:
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
csmall/testdebian latest 6979033105a4 5 weeks ago 369.4 MB
gcc 5.1 b063030b23b8 5 weeks ago 1.225 GB
gcc 5.1.0 b063030b23b8 5 weeks ago 1.225 GB
gcc latest b063030b23b8 5 weeks ago 1.225 GB
ruby 2.1 236bf35223e7 6 weeks ago 779.8 MB
ruby 2.1.6 236bf35223e7 6 weeks ago 779.8 MB
debian jessie 41b730702607 6 weeks ago 125.1 MB
debian latest 41b730702607 6 weeks ago 125.1 MB
debian 8 41b730702607 6 weeks ago 125.1 MB
debian 8.0 41b730702607 6 weeks ago 125.1 MB
busybox buildroot-2014.02 8c2e06607696 8 weeks ago 2.433 MB
busybox latest 8c2e06607696 8 weeks ago 2.433 MB
What a relief to see this! Work out what images you need to transfer over. In my case it was just the csmall/testdebian one. You need to save it to a tar file.
$ docker save csmall/testdebian &gt; csmall-testdebian.tar.gz
Once you have all your images you want, reboot back to your 4.x kernel. You then need to load each image back into docker.
$ docker load csmall-testdebian.tar.gz
and then test to see its there
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
csmall/testdebian latest 6979033105a4 5 weeks ago 368.3 MB
The size of my image was slightly different. I m not too sure why this is the case but assume it is to do with the storage types. My CI builds now run, they re failing because my test program is trying to link with the library (it shouldn t be) but at least its not a docker problem.

8 June 2015

Craig Small: Checking Cloudflare SSL

My website for a while has used CloudFlare as its front-end. It s a rather nice setup and means my real server gets less of a hammering, which is a good thing. A few months ago they enabled a feature called Universal SSL which I have also added to my site. Around the same time, my SSL check scripts started failing for the website, the certificate had expired apparently many many days ago. Something wasn t right. The Problem The problem was simply at first I d get emails saying The SSL certificate for enc.com.au (CN: ) has expired! . I use a program called ssl-cert-check that would check all (web, smtp, imap) of my certificates. It s very easy to forget to renew and this program runs daily and does a simple check. Running the program on the command line gave some more information, but nothing (for me) that really helped:
$ /usr/bin/ssl-cert-check -s enc.com.au -p 443
Host Status Expires Days
----------------------------------------------- ------------ ------------ ----
unable to load certificate
140364897941136:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: TRUSTED CERTIFICATE
unable to load certificate
139905089558160:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: TRUSTED CERTIFICATE
unable to load certificate
140017829234320:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: TRUSTED CERTIFICATE
unable to load certificate
140567473276560:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: TRUSTED CERTIFICATE
enc.com.au:443 Expired -2457182
So, apparently, there was something wrong with the certificate. The problem was this was CloudFlare who seem to have a good idea on how to handle certificates and all my browsers were happy. ssl-cert-check is a shell script that uses openssl to make the connection, so the next stop was to see what openssl had to say.
$ echo ""   /usr/bin/openssl s_client -connect enc.com.au:443 CONNECTED(00000003)
140115756086928:error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error:s23_clnt.c:769:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 7 bytes and written 345 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
---
No peer certificate available. That was the clue I was looking for. Where s my Certificate? CloudFlare Universal SSL uses certificates that have multiple domains in the one certificate. The do this by having one canonical name which is something like sni(numbers).cloudflaressl.com and then multiple Subject Alternative Names (a bit like ServerAlias in apache configurations). This way a single server with a single certificate can serve multiple domains. The way that the client tells the server which website it is looking for is Server Name Indication (SNI). As part of the TLS handshaking the client tells the server I want website www.enc.com.au . The thing is, by default, both openssl s_client and the check script do not use this feature. That was fail the SSL certificate checks were failing. The server was waiting for the client to ask what website it wanted. Modern browsers do this automatically so it just works for them. The Fix For openssl on the command line, there is a flag -servername which does the trick nicely:
$ echo ""   /usr/bin/openssl s_client -connect enc.com.au:443 -servername enc.com.au 
CONNECTED(00000003)
depth=2 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO ECC Certification Authority
verify error:num=20:unable to get local issuer certificate
---
(lots of good SSL type messages)
That was openssl happy now. We asked the server what website we were interested in with the -servername and got the certificate. The fix for ssl-cert-check is even simpler. Like a lot of things once you know the problem, the solution is not only easy to work out but someone has done it for you already. There is a Debian bug report on this problem with a simple fix from Francois Marier. Just edit the check script and change the line that has:
 TLSSERVERNAME="FALSE"
and change it to true. Then the script is happy too:
$ ssl-cert-check -s enc.com.au -p https
Host Status Expires Days
----------------------------------------------- ------------ ------------ ----
enc.com.au:https Valid Sep 30 2015 114
All working and as expected! This isn t really a CloudFlare problem as such, it is just that s the first place I had seen these sort of SNI certificates being used in something I administer (or more correctly something behind the something).

16 May 2015

Craig Small: Debian, WordPress and Multi-site

For quite some time, the Debian version of WordPress has had a configuration tweak that made it possible to run multiple websites on the same server. This came from a while ago when multi-site wasn t available. While a useful feature, it does make the initial setup of WordPress for simple sites more complicated. I m looking at changing the Debian package slightly so that for a single-site use it Just Works. I have also looked into the way WordPress handles the content, especially themes and plugins, to see if there is a way of updating them through the website itself. This probably won t suit everyone but I think its a better default. The idea will be to setup Debian packages something like this by default and then if you want more fancier stuff its all still there, just not setup. It s not setup at the moment but the default is a little confusing which I hope to change. Multisite The first step was to get my pair of websites into one. So first it was backing up time and then the removal of my config-websitename.php files in /etc/wordpress. I created a single /etc/wordpress/config-default.php file that used a new database. This initial setup worked ok and I had the primary site going reasonably quickly. The second site was a little trickier. The problem is that multisite does things like foo.example.com and bar.example.com while I wanted example.com and somethingelse.com There is a plugin wordpress-mu-domain-mapping that almost sorta-kinda works. While it let me make the second site with a different name, it didn t like aliases, especially if the alias was the first site. Some evil SQL fixed that nicely. UPDATE wp_domain_mapping SET blog_id=1 WHERE id=2 So now I had: Files and Permissions We really three separate sets of files in wordpress. These files come from three different sources and are updated using three different ways with a different release cycle. The first is the wordpress code which is shipped in the Debian package. All of this code lives in /usr/share/wordpress and is only changed if you update the Debian package, or you fiddle around with it. It needs to be readable to the webserver but not writable. The config files in /etc/wordpress are in this lot too. Secondly, we have the user generated data. This is things like your pictures that you add to the blog. As they are uploaded through the webserver, it needs to be writable to it. These files are located in /var/lib/wordpress/wp-content/uploads Third, is the plugins and themes. These can either be unzipped and placed into a directory or directly loaded in from the webserver. I used to do the first way but are trying the second. These files are located in /var/lib/wordpress/wp-content Ever tried to update your plugins and get the FTP prompt? This is because the wp-content directory is not writable. I adjusted the permissions and now when a plugin wants to update, I click yes and it magically happens! You will have to reference the /var/lib/wordpress/wp-content subdirectory in two places: What broke Images did, in a strange way. My media library is empty, but my images are still there. Something in the export and reimport did not work. For me its a minor inconvenience and due to moving from one system to another, but it still is there.

14 May 2015

Craig Small: Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!

11 May 2015

Craig Small: procps using GitLab CI

procps-ciThe procps project for a few years has been hosted at Gitorious. With the announcement that Gitorious has been acquired by GitLab and that all repositories need to move there, procps moved along to GitLab. At first I thought it would just be a like for like thing, but then I noticed that GitLab has this GitLab CI feature and had to try it out. CI here stands for Continuous Integration and is a way of automatically testing your program builds using a bunch of test scripts. procps already has a set of tests, with some a level of coverage that has room for improvement, so it was a good candidate to use for CI. The way GitLab works is they have a central control point that is linked to the git repo and you create runners, which are the systems that actually compile the programs and run the tests. The runners then feed back their results and GitLab CI shows it all in pretty red or green. The first problem was building this runner. Most of the documentation assumes you are building testing for Ruby. procps is built using C with the test scripts in TCL, so there was going to be some need to make some adjustments. I chose to use docker containers so that there was at least some isolation between the runners and the base system. I soon found the docker container I used (ruby:2.6 as suggested by GitLab) didn t have autopoint which mean autogen.sh failed so I had no configure script so no joy there. Now my second problem was I had never used docker before and beyond that it was some sort of container thing so like virtual machines lite, I didn t know much about it. The docker docs are very good and soon I had built my own custom docker image that had all the programs I need to compile and test procps. It s basically a cut-down Debian image with a few things like gettext-bin and dejagnu added in. Docker is not a full system With that all behind me and a few oh I need that too (don t forget you need git) moments we had a working CI runner. This was a Good Thing. You then find that your assumptions for your test cases may not always be correct. This is especially noticeable when testing something like procps which needs to work of the proc filesystem. A good example is, what uses session ID 1? Generally its init or systemd, but in Docker this is what everything runs as. A test case which assumes things about SID=1 will fail, as it did. This probably won t be a problem for testing a lot of normal programs that don t need to dig a deep into the host system as procps does, but it is something to remember. The docker environment looks a lot like a real system from the inside, but there are differences, so the lesson here is to write better tests (or fix the ones that failed, like I did). The runner and polling Now, there needs to be communication between the CI website and the runner, so the runner knows there is something for it to do. The gitlab runner has this setup rather well, except the runner is rather aggressive about its polling, hitting the website every 3 seconds. For someone on some crummy Australian Internet with low speeds and download quotas, this can get expensive in network resources. As there is on average an update once a week or so these seems rather excessive. My fix is pretty manual, actually its totally manual. I stop the daemon, then if I notice there are pending jobs I start the daemon, let it do its thing, then shut it down again. This is certainly not an optimal solution but works for now. I will look into doing something more clever later, possibly with webhooks.

28 April 2015

Craig Small: Backporting and git-buildpackage

For working with Debian packages, one method of maintaining them is to put them in git and use git-buildpackage to build them right out of the git repository. There are a few pitfalls with it, notably around if you forget to import the upstream you get this strange treeish related error which still throws me at first when I see it. Part of maintaining packages is to be able to fix security bugs in older versions of them that are found in stable and even sometimes old stable (jessie and wheezy respectively at the time of writing). At first I used to do this outside git because to me there wasn t a clear way of doing it within it. This is not too satisfactory because it means you lose the benefits of using git in the first place, and for distributions you are more likely to need collaboration with, such as working with the security team or help with backporting. So because I don t think the details are easily found and also I ll probably lose them again and need a central spot to find what I did last time. Building the environment The first step for a new distribution is to create the building environment. You really only need to do this once when there is a new distribution with then possibly some updates from time to time. The command will create the environment in /var/cache/pbuilder/base-DIST.cow/
DIST=wheezy git-pbuilder create
You will find all the files in /var/cache/pbuilder/base-wheezy.cow/ which is then ready to do be used. For more information about this useful tool, look at the git-builder entry on the Debian wiki. Creating the branch The master branch is generally where sid will located. For the other distributions, I use branches. For the initial setup you will need to create the branch and start it off from the right point. For example, the last non-security update for wordpress in jessie was version 4.1+dfsg-1. We then want the jessie branch to start from this point.
git branch jessie debian/4.1+dfsg-1
git checkout jessie
This will then create and put you on the jessie branch. You only need the first line once. You can switch between sid (master branch) and jessie (jessie branch). At this point you have a working place to make all the updates you need to do. Nothing is terribly different from the usual workflow. Building the package Make sure you have checked in all your changes and now it is time to build your package!
git-buildpackage --git-pbuilder --git-dist=jessie --git-debian-branch=jessie
You may need two additional flags:

Next.

Previous.